On cherche à étudier l’effet de trois facteurs sur le transcriptome des racines d’Arabidopsis thaliana et de la micro Tomate.
## corrplot 0.84 loaded
load(paste0("./GenesCO2_", specie, ".RData"))
load("./normalized.count_At.RData")
# quantification file
data <- read.csv("quantifFiles/QuantifGenes.csv", h = T, sep = ",")
rownames(data) <- data$Gene
genes = which(!(grepl("__", rownames(data))))
not_quant = data[which((grepl("__", rownames(data)))), ]
data = data[genes, grepl("R", colnames(data))]
keep <- rowSums(data) >= 10
data <- data[keep, ]
group <- sapply(colnames(data), getLabel, with.rep = F)
colnames(data) <- sapply(colnames(data), getLabel)
specie = "At"
clusteredGenes <- clustering(sharedBy3, data)****************************************
coseq analysis: Poisson approach & none transformation
K = 2 to 12
Use set.seed() prior to running coseq for reproducible results.
****************************************
Running g = 2 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0"
Running g = 3 ...
[1] "Initialization: 1"
[1] "Log-like diff: 4.90653064844082e-08"
Running g = 4 ...
[1] "Initialization: 1"
[1] "Log-like diff: 9.85664883046411e-11"
Running g = 5 ...
[1] "Initialization: 1"
[1] "Log-like diff: 5.50989511793887e-08"
Running g = 6 ...
[1] "Initialization: 1"
[1] "Log-like diff: 1.78097423031431e-07"
Running g = 7 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0"
Running g = 8 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0"
Running g = 9 ...
[1] "Initialization: 1"
[1] "Log-like diff: 5.05190030253289e-07"
Running g = 10 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0"
Running g = 11 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0"
Running g = 12 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0"
$ICL
$profiles
$boxplots
$probapost_barplots
*************************************************
Model: Poisson
Transformation: none
*************************************************
Clusters fit: 2,3,4,5,6,7,8,9,10,11,12
Clusters with errors: ---
Selected number of clusters via ICL: 12
ICL of selected model: -740101.6
*************************************************
Number of clusters = 12
ICL = -740101.6
*************************************************
Cluster sizes:
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7
9 12 1 22 6 24 12
Cluster 8 Cluster 9 Cluster 10 Cluster 11 Cluster 12
3 4 10 10 18
Number of observations with MAP > 0.90 (% of total):
131 (100%)
Number of observations with MAP > 0.90 per cluster (% of total per cluster):
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7
9 12 1 22 6 24 12
(100%) (100%) (100%) (100%) (100%) (100%) (100%)
Cluster 8 Cluster 9 Cluster 10 Cluster 11 Cluster 12
3 4 10 10 18
(100%) (100%) (100%) (100%) (100%)
a$cluster <- clusteredGenes[a$ensembl_gene_id]
entrezID <- list()
nb_clust = max(clusteredGenes)
for (clust in seq(1:nb_clust)) {
# print(entrez[entrez$cluster == clust,]$ensembl_transcript_id)
entrezID[[length(entrezID) + 1]] <- na.omit(a[a$cluster == clust, ]$entrezgene_id)
}
names(entrezID) <- as.character(seq(1:nb_clust))
ck <- compareCluster(geneCluster = entrezID, fun = "enrichGO", OrgDb = org.At.tair.db,
ont = "BP", pAdjustMethod = "BH", pvalueCutoff = 0.01, qvalueCutoff = 0.05)
clusterProfiler::dotplot(ck, x = ~Cluster)# On essaie un autre clustering avec lka librarie MPLN mpln (dataset =
# as.matrix(data[sharedBy3,])) beaucoup trop long, même mutlithreadé, c'est n'impModel-Based Clustering Using MPLN (Parallelized) Description Performs clustering using mixtures of multivariate Poisson-log normal (MPLN) distribution and model selection using AIC, AIC3, BIC and ICL. Since each component/cluster size (G) is independent from another, all Gs in the range to be tested have been parallelized to run on a seperate core using the parallel R package.
Class: pca dudi
Call: dudi.pca(df = log(data + 0.1), center = TRUE, scale = TRUE, scannf = FALSE,
nf = 4)
Total inertia: 24
Eigenvalues:
Ax1 Ax2 Ax3 Ax4 Ax5
17.2861 4.7060 0.6320 0.5028 0.2615
Projected inertia (%):
Ax1 Ax2 Ax3 Ax4 Ax5
72.025 19.608 2.633 2.095 1.089
Cumulative projected inertia (%):
Ax1 Ax1:2 Ax1:3 Ax1:4 Ax1:5
72.03 91.63 94.27 96.36 97.45
(Only 5 dimensions (out of 24) are shown)
NULL
On essaie avec une ACP basée sur modèle Poisson Log Normal, de PNLModels :
groups <- str_split_fixed(colnames(data), "_", 2)[, 1]
co2 <- str_split_fixed(groups, "", 3)[, 1]
nitrate <- factor(str_split_fixed(groups, "", 3)[, 2])
nitrate <- relevel(nitrate, "N")
fer <- factor(str_split_fixed(groups, "", 3)[, 3])
fer = relevel(fer, "F")
covariates <- data.frame(row.names = colnames(data), co2, nitrate, fer)
DEGenes <- sharedBy3
# preparation des données
counts <- round(t(data[DEGenes, ]), 0)
plnData <- prepare_data(counts = counts, covariates = covariates)
PCA_models <- PLNPCA(Abundance ~ 1 + nitrate + fer + co2 + offset(log(Offset)), data = plnData,
ranks = 1:10)
Initialization...
Adjusting 10 PLN models for PCA analysis.
Rank approximation = 1
Rank approximation = 2
Rank approximation = 3
Rank approximation = 4
Rank approximation = 5
Rank approximation = 6
Rank approximation = 7
Rank approximation = 8
Rank approximation = 9
Rank approximation = 10
Post-treatments
DONE!
--------------------------------------------------------
COLLECTION OF 10 POISSON LOGNORMAL MODELS
--------------------------------------------------------
Task: Principal Component Analysis
========================================================
- Ranks considered: from 1 to 10
- Best model (greater BIC): rank = 10 - R2 = 0.97
- Best model (greater ICL): rank = 10 - R2 = 0.97
myPCA_ICL <- getBestModel(PCA_models, "ICL")
plot(myPCA_ICL, ind_cols = groups, var_cols = factor(clusteredGenes[rownames(data[DEGenes,
])]))gridExtra::grid.arrange(plot(myPCA_ICL, ind_cols = groups, map = "individual", plot = FALSE),
plot(myPCA_ICL, var_cols = factor(clusteredGenes[rownames(data[DEGenes, ])]),
map = "variable", plot = FALSE), ncol = 2)log.data <- log2(normalized.count[sharedBy3, ] + 1)
Norm.interest.corr <- corr.test(t(log.data), method = "pearson", ci = F)
Norm.interest.corr$p[lower.tri(Norm.interest.corr$p, diag = -TRUE)] = NA
Pval.adj <- as.data.frame(as.table(Norm.interest.corr$p))
Norm.interest.corr$r[lower.tri(Norm.interest.corr$r, diag = TRUE)] = NA
Correlation <- as.data.frame(as.table(Norm.interest.corr$r))
Cor.table <- na.exclude(cbind(Correlation, Pval.adj))[, c(1, 2, 3, 6)]
colnames(Cor.table) <- c("gene1", "gene2", "cor", "p.adj")
Cor.table.filt <- Cor.table[(abs(Cor.table[, 3]) > 0.9 & Cor.table[, 4] < 0.01),
]
g <- graph.data.frame(Cor.table.filt[, 1:2], directed = -FALSE)
V(g)$color <- clusteredGenes[V(g)]
Node_nw_st <- netStats(g)
Initialization...
Adjusting 30 PLN with sparse inverse covariance estimation
Joint optimization alternating gradient descent and graphical-lasso
sparsifying penalty = 1.673463
sparsifying penalty = 1.545729
sparsifying penalty = 1.427745
sparsifying penalty = 1.318766
sparsifying penalty = 1.218106
sparsifying penalty = 1.125129
sparsifying penalty = 1.039249
sparsifying penalty = 0.9599238
sparsifying penalty = 0.8866536
sparsifying penalty = 0.8189761
sparsifying penalty = 0.7564644
sparsifying penalty = 0.6987241
sparsifying penalty = 0.6453911
sparsifying penalty = 0.5961289
sparsifying penalty = 0.5506269
sparsifying penalty = 0.508598
sparsifying penalty = 0.4697772
sparsifying penalty = 0.4339195
sparsifying penalty = 0.4007988
sparsifying penalty = 0.3702062
sparsifying penalty = 0.3419487
sparsifying penalty = 0.315848
sparsifying penalty = 0.2917396
sparsifying penalty = 0.2694714
sparsifying penalty = 0.2489028
sparsifying penalty = 0.2299043
sparsifying penalty = 0.2123559
sparsifying penalty = 0.196147
sparsifying penalty = 0.1811752
sparsifying penalty = 0.1673463
Post-treatments
DONE!
Stability Selection for PLNnetwork:
subsampling: ++++++++++++++++++++
V(net_norm)$color <- clusteredGenes[V(net_norm)]
plot.igraph(net_norm, vertex.size = 10, vertex.label.cex = 0.5, edge.width = 0.5)[1] 431
[1] 476
****************************************
coseq analysis: Poisson approach & none transformation
K = 2 to 12
Use set.seed() prior to running coseq for reproducible results.
****************************************
Running g = 2 ...
[1] "Initialization: 1"
[1] "Log-like diff: 4.59017712728382e-10"
Running g = 3 ...
[1] "Initialization: 1"
[1] "Log-like diff: 1411.59027483396"
[1] "Log-like diff: 135.834263142698"
[1] "Log-like diff: 4.87166334779407"
[1] "Log-like diff: 0.163549842887022"
[1] "Log-like diff: 0.00440584508602626"
Running g = 4 ...
[1] "Initialization: 1"
[1] "Log-like diff: 1.21945280540103e-07"
Running g = 5 ...
[1] "Initialization: 1"
[1] "Log-like diff: 94.168448285313"
[1] "Log-like diff: 27.8326764519581"
[1] "Log-like diff: 0.0538298186603754"
[1] "Log-like diff: 0.00039741358597567"
[1] "Log-like diff: 5.95410822157305e-06"
Running g = 6 ...
[1] "Initialization: 1"
[1] "Log-like diff: 4.55134596677453e-10"
Running g = 7 ...
[1] "Initialization: 1"
[1] "Log-like diff: 2.99828419869641e-05"
[1] "Log-like diff: 1.15869511319033e-06"
Running g = 8 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0.00324992917408906"
[1] "Log-like diff: 8.9456142298161e-05"
[1] "Log-like diff: 7.87921585754248e-06"
Running g = 9 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0.0027331480212851"
[1] "Log-like diff: 3.20043587862529e-05"
[1] "Log-like diff: 4.71615422270588e-07"
Running g = 10 ...
[1] "Initialization: 1"
[1] "Log-like diff: 180.385205910929"
[1] "Log-like diff: 0.132498221681843"
[1] "Log-like diff: 0.000265799264397515"
[1] "Log-like diff: 3.29221322772355e-05"
[1] "Log-like diff: 5.96817702813723e-07"
Running g = 11 ...
[1] "Initialization: 1"
[1] "Log-like diff: 7.89839428474253"
[1] "Log-like diff: 0.914622663010462"
[1] "Log-like diff: 0.527449866368762"
[1] "Log-like diff: 0.244277441686506"
[1] "Log-like diff: 0.112796925842808"
Running g = 12 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0.00324309141651824"
[1] "Log-like diff: 0.000818172500085979"
[1] "Log-like diff: 0.000206378430380738"
[1] "Log-like diff: 5.20935887955432e-05"
[1] "Log-like diff: 1.31237750160551e-05"
$ICL
$profiles
$boxplots
$probapost_barplots
*************************************************
Model: Poisson
Transformation: none
*************************************************
Clusters fit: 2,3,4,5,6,7,8,9,10,11,12
Clusters with errors: ---
Selected number of clusters via ICL: 12
ICL of selected model: -3013888
*************************************************
Number of clusters = 12
ICL = -3013888
*************************************************
Cluster sizes:
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7
63 98 74 47 103 38 21
Cluster 8 Cluster 9 Cluster 10 Cluster 11 Cluster 12
160 5 24 154 50
Number of observations with MAP > 0.90 (% of total):
833 (99.52%)
Number of observations with MAP > 0.90 per cluster (% of total per cluster):
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7
63 98 74 47 103 37 21
(100%) (100%) (100%) (100%) (100%) (97.37%) (100%)
Cluster 8 Cluster 9 Cluster 10 Cluster 11 Cluster 12
159 5 24 153 49
(99.38%) (100%) (100%) (99.35%) (98%)
a$cluster <- clusteredGenes[a$ensembl_gene_id]
entrezID <- list()
nb_clust = max(clusteredGenes)
for (clust in seq(1:nb_clust)) {
# print(entrez[entrez$cluster == clust,]$ensembl_transcript_id)
entrezID[[length(entrezID) + 1]] <- na.omit(a[a$cluster == clust, ]$entrezgene_id)
}
names(entrezID) <- as.character(seq(1:nb_clust))
ck <- compareCluster(geneCluster = entrezID, fun = "enrichGO", OrgDb = org.At.tair.db,
ont = "BP", pAdjustMethod = "BH", pvalueCutoff = 0.01, qvalueCutoff = 0.05)
clusterProfiler::dotplot(ck, x = ~Cluster)Class: pca dudi
Call: dudi.pca(df = log(data + 0.1), center = TRUE, scale = TRUE, scannf = FALSE,
nf = 4)
Total inertia: 24
Eigenvalues:
Ax1 Ax2 Ax3 Ax4 Ax5
19.1712 3.3535 0.5249 0.3943 0.1205
Projected inertia (%):
Ax1 Ax2 Ax3 Ax4 Ax5
79.880 13.973 2.187 1.643 0.502
Cumulative projected inertia (%):
Ax1 Ax1:2 Ax1:3 Ax1:4 Ax1:5
79.88 93.85 96.04 97.68 98.18
(Only 5 dimensions (out of 24) are shown)
NULL
groups <- str_split_fixed(colnames(data), "_", 2)[, 1]
co2 <- str_split_fixed(groups, "", 3)[, 1]
nitrate <- factor(str_split_fixed(groups, "", 3)[, 2])
nitrate <- relevel(nitrate, "N")
fer <- factor(str_split_fixed(groups, "", 3)[, 3])
fer = relevel(fer, "F")
covariates <- data.frame(row.names = colnames(data), co2, nitrate, fer)
sample_genes <- sample(sharedBy3, 400)
DEGenes <- sample_genes
# preparation des données
counts <- round(t(data[DEGenes, ]), 0)
plnData <- prepare_data(counts = counts, covariates = covariates)
PCA_models <- PLNPCA(Abundance ~ 1 + nitrate + fer + co2 + offset(log(Offset)), data = plnData,
ranks = 1:10)
Initialization...
Adjusting 10 PLN models for PCA analysis.
Rank approximation = 1
Rank approximation = 2
Rank approximation = 3
Rank approximation = 4
Rank approximation = 5
Rank approximation = 6
Rank approximation = 7
Rank approximation = 8
Rank approximation = 9
Rank approximation = 10
Post-treatments
DONE!
--------------------------------------------------------
COLLECTION OF 10 POISSON LOGNORMAL MODELS
--------------------------------------------------------
Task: Principal Component Analysis
========================================================
- Ranks considered: from 1 to 10
- Best model (greater BIC): rank = 10 - R2 = 0.97
- Best model (greater ICL): rank = 10 - R2 = 0.97
myPCA_ICL <- getBestModel(PCA_models, "ICL")
plot(myPCA_ICL, ind_cols = groups, var_cols = factor(clusteredGenes[rownames(data[DEGenes,
])]))gridExtra::grid.arrange(plot(myPCA_ICL, ind_cols = groups, map = "individual", plot = FALSE),
plot(myPCA_ICL, var_cols = factor(clusteredGenes[rownames(data[DEGenes, ])]),
map = "variable", plot = FALSE), ncol = 2)log.data <- log2(normalized.count[sharedBy3, ] + 1)
Norm.interest.corr <- corr.test(t(log.data), method = "pearson", ci = F)
Norm.interest.corr$p[lower.tri(Norm.interest.corr$p, diag = -TRUE)] = NA
Pval.adj <- as.data.frame(as.table(Norm.interest.corr$p))
Norm.interest.corr$r[lower.tri(Norm.interest.corr$r, diag = TRUE)] = NA
Correlation <- as.data.frame(as.table(Norm.interest.corr$r))
Cor.table <- na.exclude(cbind(Correlation, Pval.adj))[, c(1, 2, 3, 6)]
colnames(Cor.table) <- c("gene1", "gene2", "cor", "p.adj")
Cor.table.filt <- Cor.table[(abs(Cor.table[, 3]) > 0.9 & Cor.table[, 4] < 0.01),
]
g <- graph.data.frame(Cor.table.filt[, 1:2], directed = -FALSE)
V(g)$color <- clusteredGenes[V(g)]
netStats(g) degree betweenness Rank_stat
AT4G32950 168 4195.902 684.50
AT2G30660 177 3893.654 682.00
AT5G39130 157 4167.511 680.00
AT1G74590 163 3818.345 679.00
AT5G54370 182 3334.164 677.00
AT3G44300 136 7640.805 673.50
AT3G16180 147 3766.436 671.00
AT1G52070 136 4848.893 669.00
AT5G24070 134 3813.655 659.00
AT2G46680 166 2202.663 658.50
AT4G23400 152 2417.920 658.25
AT2G31880 130 4797.134 657.50
AT1G69490 159 2110.001 654.50
AT2G30670 146 2258.923 653.00
AT1G20180 153 1838.916 644.00
AT2G21880 127 3149.782 643.50
AT5G06730 164 1721.279 643.00
AT5G38900 122 3819.292 639.00
AT1G01680 135 2184.533 638.75
AT5G46890 162 1569.691 633.00
AT1G22530 124 2724.852 632.50
AT5G22270 130 2131.515 629.00
AT4G25250 113 4488.276 628.00
AT4G27400 134 1838.540 627.00
AT1G25400 140 1671.199 626.25
[ reached 'max' / getOption("max.print") -- omitted 670 rows ]
Initialization...
Adjusting 30 PLN with sparse inverse covariance estimation
Joint optimization alternating gradient descent and graphical-lasso
sparsifying penalty = 1.07978
sparsifying penalty = 0.9973615
sparsifying penalty = 0.9212337
sparsifying penalty = 0.8509167
sparsifying penalty = 0.785967
sparsifying penalty = 0.7259748
sparsifying penalty = 0.6705618
sparsifying penalty = 0.6193784
sparsifying penalty = 0.5721018
sparsifying penalty = 0.5284337
sparsifying penalty = 0.4880988
sparsifying penalty = 0.4508427
sparsifying penalty = 0.4164302
sparsifying penalty = 0.3846445
sparsifying penalty = 0.3552849
sparsifying penalty = 0.3281663
sparsifying penalty = 0.3031176
sparsifying penalty = 0.2799809
sparsifying penalty = 0.2586102
sparsifying penalty = 0.2388707
sparsifying penalty = 0.2206379
sparsifying penalty = 0.2037968
sparsifying penalty = 0.1882412
sparsifying penalty = 0.1738729
sparsifying penalty = 0.1606013
sparsifying penalty = 0.1483428
sparsifying penalty = 0.1370199
sparsifying penalty = 0.1265613
sparsifying penalty = 0.116901
sparsifying penalty = 0.107978
Post-treatments
DONE!
Stability Selection for PLNnetwork:
subsampling: ++++++++++++++++++++
V(net)$color <- clusteredGenes[V(net)]
# V(net)$size <- degree[V(net)]*0.1
plot.igraph(net, vertex.size = 10, vertex.label.cex = 0.4, edge.width = 0.5) degree betweenness Rank_stat
AT1G21240 158 16953.1888 400.00
AT1G33820 134 12667.9013 399.00
AT3G53770 95 3946.6477 397.00
AT5G55110 103 2920.1525 397.00
AT1G34047 57 4572.2856 395.50
AT4G27400 69 1640.9440 395.00
AT5G46890 63 1462.3665 394.00
AT1G33960 61 935.9150 391.00
AT1G10070 37 2611.7509 390.25
AT5G43590 55 966.5146 390.00
AT1G08430 47 1201.3616 389.50
AT4G32950 56 765.7225 389.00
AT1G52130 48 761.4761 387.00
AT2G41850 49 508.6514 387.00
AT2G23270 34 1081.6611 385.75
AT5G20790 35 889.9951 385.00
AT1G15125 28 1392.5802 384.50
AT1G51840 37 342.9828 381.75
AT1G01680 36 375.8168 381.50
AT2G30670 41 251.8479 380.50
AT2G45760 28 506.1166 380.00
AT2G40740 34 321.6113 379.25
AT1G63590 28 422.9557 379.00
AT4G24350 25 445.6824 376.50
AT5G35525 32 235.7482 374.50
[ reached 'max' / getOption("max.print") -- omitted 375 rows ]
****************************************
coseq analysis: Poisson approach & none transformation
K = 2 to 12
Use set.seed() prior to running coseq for reproducible results.
****************************************
Running g = 2 ...
[1] "Initialization: 1"
[1] "Log-like diff: 1.06819442180495e-10"
Running g = 3 ...
[1] "Initialization: 1"
[1] "Log-like diff: 6.38532489389831e-05"
[1] "Log-like diff: 6.09656783900903e-06"
Running g = 4 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0.0111170800072813"
[1] "Log-like diff: 0.00141139826298442"
[1] "Log-like diff: 0.000161397428641408"
[1] "Log-like diff: 2.2397745002678e-05"
[1] "Log-like diff: 2.78974421874523e-06"
Running g = 5 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0.0387118669603943"
[1] "Log-like diff: 0.00732243729624926"
[1] "Log-like diff: 0.00136628427148366"
[1] "Log-like diff: 0.000246221735089591"
[1] "Log-like diff: 4.5249045658835e-05"
Running g = 6 ...
[1] "Initialization: 1"
[1] "Log-like diff: 1091.54513354431"
[1] "Log-like diff: 384.306342804397"
[1] "Log-like diff: 471.016220532387"
[1] "Log-like diff: 427.086443409924"
[1] "Log-like diff: 664.326791316329"
Running g = 7 ...
[1] "Initialization: 1"
[1] "Log-like diff: 427.257427847843"
[1] "Log-like diff: 680.01581885371"
[1] "Log-like diff: 1786.85942209259"
[1] "Log-like diff: 1583.70876190583"
[1] "Log-like diff: 303.739086114212"
Running g = 8 ...
[1] "Initialization: 1"
[1] "Log-like diff: 241.58230457968"
[1] "Log-like diff: 364.01646505909"
[1] "Log-like diff: 398.598735924692"
[1] "Log-like diff: 2.65393254768138"
[1] "Log-like diff: 112.913902215817"
Running g = 9 ...
[1] "Initialization: 1"
[1] "Log-like diff: 416.451620701424"
[1] "Log-like diff: 358.536512856528"
[1] "Log-like diff: 51.7726131046729"
[1] "Log-like diff: 281.540823604593"
[1] "Log-like diff: 92.5768743207845"
Running g = 10 ...
[1] "Initialization: 1"
[1] "Log-like diff: 24.1783541782546"
[1] "Log-like diff: 17.4979676949227"
[1] "Log-like diff: 1.99520130713022"
[1] "Log-like diff: 23.5706044092903"
[1] "Log-like diff: 4.05590908308703"
Running g = 11 ...
[1] "Initialization: 1"
[1] "Log-like diff: 1212.67923420344"
[1] "Log-like diff: 1457.72471004209"
[1] "Log-like diff: 798.015375131625"
[1] "Log-like diff: 275.736282250256"
[1] "Log-like diff: 30.0350562940232"
Running g = 12 ...
[1] "Initialization: 1"
[1] "Log-like diff: 670.628345670554"
[1] "Log-like diff: 334.935422347285"
[1] "Log-like diff: 325.249048271772"
[1] "Log-like diff: 393.053680612322"
[1] "Log-like diff: 71.9330493252254"
$ICL
$profiles
$boxplots
$probapost_barplots
*************************************************
Model: Poisson
Transformation: none
*************************************************
Clusters fit: 2,3,4,5,6,7,8,9,10,11,12
Clusters with errors: ---
Selected number of clusters via ICL: 12
ICL of selected model: -3391000
*************************************************
Number of clusters = 12
ICL = -3391000
*************************************************
Cluster sizes:
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7
107 164 30 638 337 43 433
Cluster 8 Cluster 9 Cluster 10 Cluster 11 Cluster 12
196 122 51 113 607
Number of observations with MAP > 0.90 (% of total):
2769 (97.47%)
Number of observations with MAP > 0.90 per cluster (% of total per cluster):
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7
98 158 28 626 328 40 421
(91.59%) (96.34%) (93.33%) (98.12%) (97.33%) (93.02%) (97.23%)
Cluster 8 Cluster 9 Cluster 10 Cluster 11 Cluster 12
195 118 51 107 599
(99.49%) (96.72%) (100%) (94.69%) (98.68%)
# a <- OntologyProfile(sharedBy3) a$cluster <- clusteredGenes[a$ensembl_gene_id]
# entrezID <- list() nb_clust = max(clusteredGenes) for (clust in
# seq(1:nb_clust)) { # print(entrez[entrez$cluster ==
# clust,]$ensembl_transcript_id) entrezID[[length(entrezID) + 1]] <-
# na.omit(a[a$cluster == clust, ]$entrezgene_id) } names(entrezID) <-
# as.character(seq(1:nb_clust)) ck <- compareCluster(geneCluster = entrezID, fun
# = 'enrichGO', OrgDb = org.At.tair.db, ont = 'BP', pAdjustMethod = 'BH',
# pvalueCutoff = 0.01, qvalueCutoff = 0.05) clusterProfiler::dotplot(ck, x =
# ~Cluster)
sample_genes <- sample(sharedBy3, 400)
groups <- str_split_fixed(colnames(data), "_", 2)[, 1]
co2 <- str_split_fixed(groups, "", 3)[, 1]
nitrate <- factor(str_split_fixed(groups, "", 3)[, 2])
nitrate <- relevel(nitrate, "N")
fer <- factor(str_split_fixed(groups, "", 3)[, 3])
fer = relevel(fer, "F")
covariates <- data.frame(row.names = colnames(data), co2, nitrate, fer)
DEGenes <- sample_genes
# preparation des données
counts <- round(t(data[DEGenes, ]), 0)
plnData <- prepare_data(counts = counts, covariates = covariates)
PCA_models <- PLNPCA(Abundance ~ 1 + nitrate + fer + co2 + offset(log(Offset)), data = plnData,
ranks = 1:10)
Initialization...
Adjusting 10 PLN models for PCA analysis.
Rank approximation = 1
Rank approximation = 2
Rank approximation = 3
Rank approximation = 4
Rank approximation = 5
Rank approximation = 6
Rank approximation = 7
Rank approximation = 8
Rank approximation = 9
Rank approximation = 10
Post-treatments
DONE!
--------------------------------------------------------
COLLECTION OF 10 POISSON LOGNORMAL MODELS
--------------------------------------------------------
Task: Principal Component Analysis
========================================================
- Ranks considered: from 1 to 10
- Best model (greater BIC): rank = 10 - R2 = 0.96
- Best model (greater ICL): rank = 10 - R2 = 0.96
myPCA_ICL <- getBestModel(PCA_models, "ICL")
plot(myPCA_ICL, ind_cols = groups, var_cols = factor(clusteredGenes[rownames(data[DEGenes,
])]))gridExtra::grid.arrange(plot(myPCA_ICL, ind_cols = groups, map = "individual", plot = FALSE),
plot(myPCA_ICL, var_cols = factor(clusteredGenes[rownames(data[DEGenes, ])]),
map = "variable", plot = FALSE), ncol = 2)Class: pca dudi
Call: dudi.pca(df = log(data + 0.1), center = TRUE, scale = TRUE, scannf = FALSE,
nf = 4)
Total inertia: 24
Eigenvalues:
Ax1 Ax2 Ax3 Ax4 Ax5
22.01676 1.12356 0.26987 0.11457 0.06979
Projected inertia (%):
Ax1 Ax2 Ax3 Ax4 Ax5
91.7365 4.6815 1.1245 0.4774 0.2908
Cumulative projected inertia (%):
Ax1 Ax1:2 Ax1:3 Ax1:4 Ax1:5
91.74 96.42 97.54 98.02 98.31
(Only 5 dimensions (out of 24) are shown)
NULL
log.data <- log2(normalized.count[sample_genes, ] + 1)
Norm.interest.corr <- corr.test(t(log.data), method = "pearson", ci = F)
Norm.interest.corr$p[lower.tri(Norm.interest.corr$p, diag = -TRUE)] = NA
Pval.adj <- as.data.frame(as.table(Norm.interest.corr$p))
Norm.interest.corr$r[lower.tri(Norm.interest.corr$r, diag = TRUE)] = NA
Correlation <- as.data.frame(as.table(Norm.interest.corr$r))
Cor.table <- na.exclude(cbind(Correlation, Pval.adj))[, c(1, 2, 3, 6)]
colnames(Cor.table) <- c("gene1", "gene2", "cor", "p.adj")
Cor.table.filt <- Cor.table[(abs(Cor.table[, 3]) > 0.9 & Cor.table[, 4] < 0.01),
]
g <- graph.data.frame(Cor.table.filt[, 1:2], directed = -FALSE)
V(g)$color <- clusteredGenes[V(g)]
netStats(g) degree betweenness Rank_stat
AT5G51070 59 2085.0820 278.50
AT5G53350 46 5251.7184 277.50
AT5G27350 53 1343.7899 271.50
AT5G24810 45 1625.2303 270.25
AT1G16120 57 1211.4976 269.50
AT3G59210 41 1847.0837 269.50
AT4G21534 41 1686.1707 268.50
AT5G60530 48 1185.7926 267.00
AT4G30790 52 781.6829 262.50
AT1G09935 41 1140.4143 261.00
AT3G49490 26 2830.0678 259.50
AT3G25660 32 1379.9615 259.25
AT1G50430 26 1986.4049 256.50
AT5G44510 46 663.6271 255.50
AT5G13710 30 1272.7125 255.25
AT3G03890 45 667.0113 254.25
AT5G14420 50 586.0264 252.50
AT5G40010 46 584.1702 250.50
AT4G04920 35 774.2882 250.50
AT3G09960 39 652.8921 249.25
AT5G08660 42 541.1541 245.50
AT1G51470 34 630.4088 245.00
AT4G23850 39 566.8410 244.25
AT4G24760 33 610.1257 244.00
AT4G22740 16 7769.8295 243.00
[ reached 'max' / getOption("max.print") -- omitted 257 rows ]
Initialization...
Adjusting 30 PLN with sparse inverse covariance estimation
Joint optimization alternating gradient descent and graphical-lasso
sparsifying penalty = 1.240511
sparsifying penalty = 1.145824
sparsifying penalty = 1.058364
sparsifying penalty = 0.9775802
sparsifying penalty = 0.9029624
sparsifying penalty = 0.83404
sparsifying penalty = 0.7703785
sparsifying penalty = 0.7115761
sparsifying penalty = 0.6572621
sparsifying penalty = 0.6070939
sparsifying penalty = 0.5607549
sparsifying penalty = 0.517953
sparsifying penalty = 0.4784181
sparsifying penalty = 0.4419008
sparsifying penalty = 0.4081709
sparsifying penalty = 0.3770156
sparsifying penalty = 0.3482383
sparsifying penalty = 0.3216576
sparsifying penalty = 0.2971057
sparsifying penalty = 0.2744279
sparsifying penalty = 0.2534811
sparsifying penalty = 0.2341331
sparsifying penalty = 0.2162619
sparsifying penalty = 0.1997548
sparsifying penalty = 0.1845077
sparsifying penalty = 0.1704244
sparsifying penalty = 0.157416
sparsifying penalty = 0.1454006
sparsifying penalty = 0.1343023
sparsifying penalty = 0.1240511
Post-treatments
DONE!
Stability Selection for PLNnetwork:
subsampling: ++++++++++++++++++++
IGRAPH fc0e129 UNW- 400 474 --
+ attr: name (v/c), label (v/c), label.cex (v/n), size (v/n),
| label.color (v/c), frame.color (v/l), weight (e/n), color (e/c),
| width (e/n)
+ edges from fc0e129 (vertex names):
[1] AT4G36820--AT1G10070 AT4G36820--AT5G38910 AT4G36820--AT5G66890
[4] AT4G36820--AT3G48740 AT4G36820--AT1G55790 AT4G36820--AT3G02150
[7] AT4G36820--AT2G19500 AT4G36820--AT3G53770 AT4G36820--AT2G25625
[10] AT1G51470--AT5G38910 AT1G51470--AT1G47600 AT1G51470--AT3G53770
[13] AT1G51470--AT1G50720 AT1G65500--AT5G38910 AT1G65500--AT5G66890
[16] AT1G65500--AT1G09935 AT1G65500--AT5G09570 AT1G65500--AT5G10760
+ ... omitted several edges
Initialization...
Adjusting 30 PLN with sparse inverse covariance estimation
Joint optimization alternating gradient descent and graphical-lasso
sparsifying penalty = 1.240511
sparsifying penalty = 1.145824
sparsifying penalty = 1.058364
sparsifying penalty = 0.9775802
sparsifying penalty = 0.9029624
sparsifying penalty = 0.83404
sparsifying penalty = 0.7703785
sparsifying penalty = 0.7115761
sparsifying penalty = 0.6572621
sparsifying penalty = 0.6070939
sparsifying penalty = 0.5607549
sparsifying penalty = 0.517953
sparsifying penalty = 0.4784181
sparsifying penalty = 0.4419008
sparsifying penalty = 0.4081709
sparsifying penalty = 0.3770156
sparsifying penalty = 0.3482383
sparsifying penalty = 0.3216576
sparsifying penalty = 0.2971057
sparsifying penalty = 0.2744279
sparsifying penalty = 0.2534811
sparsifying penalty = 0.2341331
sparsifying penalty = 0.2162619
sparsifying penalty = 0.1997548
sparsifying penalty = 0.1845077
sparsifying penalty = 0.1704244
sparsifying penalty = 0.157416
sparsifying penalty = 0.1454006
sparsifying penalty = 0.1343023
sparsifying penalty = 0.1240511
Post-treatments
DONE!
Stability Selection for PLNnetwork:
subsampling: ++++++++++++++++++++
V(net)$color <- clusteredGenes[V(net)]
# V(net)$size <- degree[V(net)]*0.1
plot.igraph(net, vertex.size = 10, vertex.label.cex = 0.4, edge.width = 0.5) degree betweenness Rank_stat
AT5G38910 132 9618.96649 400.00
AT5G09570 58 1788.52677 399.00
AT5G66890 42 669.82971 398.00
AT3G53770 41 407.52131 396.50
AT4G23140 25 490.16940 395.50
AT1G50720 31 390.03074 395.50
AT2G25625 26 356.60922 394.50
AT3G48740 19 301.23424 392.00
AT1G10070 18 313.16554 391.75
AT5G10760 18 220.20171 389.75
AT4G10860 24 132.89718 388.00
AT3G42550 14 184.38693 387.75
AT5G46960 16 147.88541 386.50
AT4G27150 12 164.55921 385.25
AT5G16970 10 278.10093 385.25
AT4G01390 11 176.83155 384.75
AT1G47600 14 62.48665 383.25
AT2G19500 12 46.05371 380.75
AT5G43650 9 75.01898 379.00
AT1G09935 11 43.04814 378.75
AT1G55790 9 46.20298 377.00
AT5G46610 11 31.25905 375.75
AT1G13310 10 33.40311 375.25
AT2G15880 5 230.57348 375.25
AT4G36820 9 37.74033 374.50
[ reached 'max' / getOption("max.print") -- omitted 375 rows ]
Réseau CO2
La tomate semble répondre différemment dans certaines mesures : plus d’effet du CO2, effet moindre du fer, effet nitrate plutôt similaire.
Ontologies moins fournies pour la tomate
Réseau CO2
DEG en commun entre les 4 comparaisons possibles pour l’effet d’un facteur (Venn diagrams)
Fait sur l’ensemble des transcriptômes (plus large que les transcriptômes sur lesquels les DEG ont été détectés)
Relevance Network fait comme Rodrigo. et Al, seuil sur la valeur de corréation et sur la pvalue, gènes triés sur leur centralité et connectivité
Visualisés dans igraph après Clustering
Visualisés dans Cytoscape, pus clustering de communautés (pluggins) et analyse d’enrichissement d’ontologies
Réseau CO2
Regression: Transcription factors are selected by target gene specific sparse linear regression and data resampling approaches.
Bayesian networks optimize posterior probabilities by different heuristic searches.
Correlation Edges are ranked based on variants of correlation.
Méthodes mixtes Consensus entre différentes techniques (conclusion de la métanalyse et benchmarcks du projet DREAM5 (wisdom of crowds))
Others : Genie3: A random forest, neural networks, chi 2 - Mutual Information: Edges are (1) ranked based on variants of mutual information and (2) filtered for causal relationships.
GGM Inference Notes :
The better should be to infer one network per condition, but as data is scarce, we’d rather use all conditions for one network : multitask learning
Assumptions :
Few interaction : sparse adj matrix
Networks are structured
Partial correlation : correlation between two genes knowing all the others. Indep ssi partial corr \(= 0\).
The general principle is to optimize a likelihood function (probability of the data given a set of parameters \(\Theta\)) minus a penalty imposing the disered assumptions, tuned by \(\lambda\) (kind of prior). Penalty enables : regularisation for high number of genes, selection for sparsity, and model driven inference.
Cette function à optimiser peut varier :
-MeinHousen and Bulhman (2006)
Si \(X_i\) est un vecteur de \(X\), alors on écrit \(X_i = X_{\i}\beta + \epsilon\) On essaie de prédire l’expression d’un gène avec tous les autres, avec \(\beta_j=- \frac{\theta_{ij}}{\theta_{ij}}\).